Successive approximations for Markov decision processes and Markov games with unbounded rewards

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Successive approximations for Markov decision processes and Markov games with unbounded rewards

• A submitted manuscript is the author's version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version ...

متن کامل

Denumerable Undiscounted Semi-Markov Decision Processes with Unbounded Rewards

This paper establishes the existence of a solution to the optimality equations in undiscounted semi-Markov decision models with countable state space, under conditions generalizing the hitherto obtained results. In particular, we merely require the existence of a finite set of states in which every pair of states can reach each other via some stationary policy, instead of the traditional and re...

متن کامل

Markov Decision Processes with Functional Rewards

Markov decision processes (MDP) have become one of the standard models for decision-theoretic planning problems under uncertainty. In its standard form, rewards are assumed to be numerical additive scalars. In this paper, we propose a generalization of this model allowing rewards to be functional. The value of a history is recursively computed by composing the reward functions. We show that sev...

متن کامل

Markov decision processes with fuzzy rewards

In this paper, we consider the model that the information on the rewards in vector-valued Markov decision processes includes imprecision or ambiguity. The fuzzy reward model is analyzed as follows: The fuzzy reward is represented by the fuzzy set on the multi-dimensional Euclidian space R and the infinite horizon fuzzy expected discounted reward(FEDR) from any stationary policy is characterized...

متن کامل

Interactive Value Iteration for Markov Decision Processes with Unknown Rewards

To tackle the potentially hard task of defining the reward function in a Markov Decision Process, we propose a new approach, based on Value Iteration, which interweaves the elicitation and optimization phases. We assume that rewards whose numeric values are unknown can only be ordered, and that a tutor is present to help comparing sequences of rewards. We first show how the set of possible rewa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematische Operationsforschung und Statistik. Series Optimization

سال: 1979

ISSN: 0323-3898

DOI: 10.1080/02331937908842597